Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to convert a list of dictionaries to a tensorflow dataset?

$
0
0

I have a .jsonl dataset that I am trying to convert into a tensorflow dataset.

Each line of the .jsonl is of the form

{"text": "some text", "meta": "irrelevant"}

I need to get it into a tensorflow dataset where each element has a key "text" associated with a tf.string value.

It seems like the closest I've gotten is the following

import tensorflow as tfds = tf.data.TextLineDataset('train_mini.jsonl')def f(tnsr):    text = eval(tnsr.numpy())['text']    return tf.constant(text)    #return {'text':text}ds = ds.map(lambda x: tf.py_function(func=f,inp=[x], Tout=tf.string))ds = tf.data.Dataset({"text": list(ds.as_numpy_iterator())})

which throws the following error

InvalidArgumentError: ValueError: Error converting unicode string while converting Python sequence to Tensor.Traceback (most recent call last):  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 241, in __call__    return func(device, token, args)  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/ops/script_ops.py", line 130, in __call__    ret = self._func(*args)  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/autograph/impl/api.py", line 309, in wrapper    return func(*args, **kwargs)  File "/home/crytting/persuasion/json_to_tfds.py", line 7, in f    return tf.constant(text)  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 262, in constant    allow_broadcast=True)  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 270, in _constant_impl    t = convert_to_eager_tensor(value, ctx, dtype)  File "/home/crytting/.local/lib/python3.6/site-packages/tensorflow/python/framework/constant_op.py", line 96, in convert_to_eager_tensor    return ops.EagerTensor(value, ctx.device_name, dtype)ValueError: Error converting unicode string while converting Python sequence to Tensor.         [[{{node EagerPyFunc}}]]

I have tried many many ways of doing this, but nothing has worked. It seems like it shouldn't be this hard, and I'm wondering if I'm missing some really simple way of doing it.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>